-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent log message truncation #6554
Conversation
src/common/libflux/flog.c
Outdated
/* Copy va_list ap in case it reused on truncation below: | ||
*/ | ||
va_copy (ap_copy, ap); | ||
|
||
if (vsnprintf (result, sizeof (buf), fmt, ap) >= sizeof (buf)) { | ||
/* Message truncation occurred. Fall back to malloc. If that | ||
* fails, just print the truncated message: | ||
*/ | ||
if (vasprintf (&mbuf, fmt, ap_copy) == 0) | ||
result = mbuf; | ||
} | ||
rc = fprintf (stderr, "%s: %s\n", lstr, result); | ||
free (mbuf); | ||
return rc < 0 ? -1 : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
va_copy(3) says
Each invocation of va_copy() must be matched by a corresponding invocation
of va_end() in the same function.
Add one after the conditional block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh! Thanks for catching this one!
/* Copy va_list ap in case it reused on truncation below: | ||
*/ | ||
va_copy (ap_copy, ap); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here
src/common/libflux/flog.c
Outdated
* If this fails for any reason, fall back to truncated output. | ||
*/ | ||
if (n >= sizeof (ctx->buf)) { | ||
int size = n++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side effect on n
seems not to have a purpose?
src/broker/shutdown.c
Outdated
@@ -66,19 +66,14 @@ static int forward_logbuf (flux_t *h, | |||
struct stdlog_header hdr; | |||
const char *txt; | |||
size_t txtlen; | |||
char buf[FLUX_MAX_LOGBUF]; | |||
int len = strlen (stdlog); | |||
char buf[len + 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is unbounded, are we concerned about blowing the stack if the system conjurs a really big log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct, that case should be handled. I'm not sure if it would just be easier to just use the heap here, since this is a limited case. Otherwise, the VLA size could be replaced with a macro that takes the min of len + 1
and some sane maximum (4K?) to avoid blowing up the stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably heap? Otherwise the sane max just becomes the new FLUX_MAX_LOGBUF (albeit just in this case!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fixed that and addressed the other comments. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nothing major in my comments so I'll approve and let you address whatever you think is worthwhile.
Problem: Flux log messages may be truncated because all messages are formated into an internal fixed size buffer of FLUX_MAX_LOGBUF (2048) bytes. On large clusters, this can result in regular loss of important information in the logs. Attempt to allocate a large enough buffer for the full log message on the heap when truncation is detected in flux_vlog(3). Do this for both the case where the log destination is stderr (h == NULL), or the message is being forwarded to a handler or via an RPC.
Problem: There is no test of logging a large message to stderr in the libflux log unit tests. Add a test that logs a message > FLUX_MAX_LOGBUF. Note: Since the log message goes to stderr, i.e. the output of the unit test itself, we can't ensure the message was not truncated. However, this is still useful to ensure the test does not die, and that no leaks are present (i.e. by running the test under valgrind)
Problem: No tests in the testsuite ensure that Flux log messages are not truncated at FLUX_MAX_LOGBUF (2048) bytes. Add a test to t0009-dmesg.t that logs a message much larger than 2048 bytes and ensures the result is not truncated.
Problem: In the broker shutdown code, log messages may be truncated because the log forwarding code uses a static buffer of FLUX_MAX_LOGBUF. Since a maximum size of the final log message is known, use a variable length array for the buffer in forward_logbuf() instead of FLUX_MAX_LOGBUF to avoid any truncation.
Problem: libflux/flog.h exposes FLUX_MAX_LOGBUF via the public API, but there are currently no users. Furthermore, the implementation now falls back to a heap-allocated buffer if the internal fixed-size buffer causes log message truncation, so exposing the internal buffer size is misleading. Move the definition of FLUX_MAX_LOGBUF out of the public API.
Thanks! Setting MWP now. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6554 +/- ##
=======================================
Coverage 79.44% 79.44%
=======================================
Files 531 531
Lines 88196 88224 +28
=======================================
+ Hits 70063 70092 +29
+ Misses 18133 18132 -1
|
This fixes #6545 by falling back to a malloc'd buffer if truncation is detected when handling log messages.
Since the internal fixed buffer size is no longer relevant, the definition of
FLUX_MAX_LOGBUF
is also removed from the public interface. However, I'm fine dropping that commit if it goes too far.Fixes #6545.